DPCA: Dimensionality Reduction for Discriminative Analytics of Multiple Large-Scale Datasets

نویسندگان

  • Gang Wang
  • Jia Chen
  • Georgios B. Giannakis
چکیده

Principal component analysis (PCA) has well-documented merits for data extraction and dimensionality reduction. PCA deals with a single dataset at a time, and it is challenged when it comes to analyzing multiple datasets. Yet in certain setups, one wishes to extract the most significant information of one dataset relative to other datasets. Specifically, the interest may be on identifying, namely extracting features that are specific to a single target dataset but not the others. This paper develops a novel approach for such so-termed discriminative data analysis, and establishes its optimality in the least-squares (LS) sense under suitable data modeling assumptions. The criterion reveals linear combinations of variables by maximizing the ratio of the variance of the target data to that of the remainders. The novel approach solves a generalized eigenvalue problem by performing SVD just once. Numerical tests using synthetic and real datasets showcase the merits of the proposed approach relative to its competing alternatives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DROP: Dimensionality Reduction Optimization for Time Series

Dimensionality reduction is a critical step in analytics pipelines for high-volume, high-dimensional time series. Principal Component Analysis (PCA) is frequently the method of choice for many applications, yet is often prohibitively expensive for large datasets. Many theoretical means of accelerating PCA via sampling have recently been proposed, but these techniques typically treat PCA as a re...

متن کامل

GPU-Based Multiple Back Propagation for Big Data Problems

The big data era has become known for its abundance in rapidly generated data of varying formats and sizes. With this awareness, interest in data analytics and more specifically predictive analytics has received increased attention lately. However, the massive sample sizes and high dimensionality peculiar with these datasets has challenged the overall performance of one of the most important co...

متن کامل

Dimensionality Reduction via Matrix Factorization for Predictive Modeling from Large, Sparse Behavioral Data

Matrix factorization is a popular technique for engineering features for use in predictive models; it is viewed as a key part of the predictive analytics process and is used in many different domain areas. The purpose of this paper is to investigate matrix-factorization-based dimensionality reduction as a design artifact in predictive analytics. With the rise in availability of large amounts of...

متن کامل

Efficient Two-Step Middle-Level Part Feature Extraction for Fine-Grained Visual Categorization

Fine-grained visual categorization (FGVC) has drawn increasing attention as an emerging research field in recent years. In contrast to generic-domain visual recognition, FGVC is characterized by high intraclass and subtle inter-class variations. To distinguish conceptually and visually similar categories, highly discriminative visual features must be extracted. Moreover, FGVC has highly special...

متن کامل

Dimensionality Reduction by Local Discriminative Gaussians

We present local discriminative Gaussian (LDG) dimensionality reduction, a supervised dimensionality reduction technique for classification. The LDG objective function is an approximation to the leave-one-out training error of a local quadratic discriminant analysis classifier, and thus acts locally to each training point in order to find a mapping where similar data can be discriminated from d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.09429  شماره 

صفحات  -

تاریخ انتشار 2017